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Computer Searches With Results Prioritized Using 
Histories Restricted By Query Context and 
User Community 

by Allen Kai-Lang Yu 

5 [0001] BACKGROUND OF THE INVENTION 

[0002] The present invention relates to computing and, more 
particularly, to computer-based searches. A major objective of the 
present invention is to provide for more effective prioritizing of 
search results. 

10 [0003] Computers have made searching for computer-based 
documents and other items a routine task. For example, a user 
today routinely searches millions of documents stored on a 
network or on the Internet based on just a small set of keywords 
that the user enters. Often, the main problem is not finding items, 

/ 5 but how to manage the abundance of items found in the search. In 
general, it is not useful to present a very large number of search 
response items in a substantively arbitrary order, e.g., alphabetical 
order. 

[0004] Many search engines urge and instruct users to better 
20 frame their search requests to limit the number of irrelevant 
response items. However, such well-framed requests are not 
always feasible. Also, many users are discoiu-aged by having to 
recast a search request; and they may not make the effort to 
develop more sophisticated search skiUs. Accordingly, it has fallen 
25 on the search engine to implement some filtering or prioritization 
strategy. 
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[0005] A search engine can prioritize based on item content: 
for example, items in which search terms appear the most times or 
most frequently can be ranked higher than other items. More 
sophisticated content-related prioritization can use probabilistic 
5 categorization techniques, such as Bayesian networks, to rank an 
item's priority based a relevance probability derived from the 
results of matching the document's word distribution to 
strategically selected word patterns. Typically, items assigned a 
high priority are presented to the user before (or higher on a list) of 
/ 0 response items. 

[0006] Alternatively or in addition, a search engine can 
exclude items that otherwise meet the search criteria based on user 
classification. For example, a search for "mouse" might exclude 
items relating to computer-input devices when a user is classified 
J 5 as "interested in pets". The class can be user-selected (as a button 
on a search form) or assigned by the search engine-e.g., on the 
basis of a pre-existing user profile. 

[0007] Meta-categorization - the association of additional 
information with documents to support a more refined and 

20 semantically based categorization scheme - can be used to further 
define relevancy with respect to user classificatbn. For example, 
keywords, e.g., "pet community", that appear to search engines but 
not in the body of the item itself can be added to a document's 
meta-description to indicate whether that document referring to a 

25 "mouse" might be interesting to the pet ODmmunity, but not 
necessarily to the computer community. However, entering and 
updating meta-keywords for a large and rapidly changing set of 
search items can be very costly. If the search engine is to cater to 
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diverse communities, the maintaining of sudi meta-categorization 
information is often not feasible. 

[0008] Search engines can use search histories to aid in 
prioritization without requiring prior metacategorization or 
5 classification of documents. In historical prioritizing, a search 
engine tracks the collective users' responses to search items. 
Indication of interest in specific searchresponse items can be 
defined, for example, as the act of a user clicking on a selected item 
for further inspection. If another user makes the same search, 
10 items indicated as interesting by previous users can be given higher 
priority. As more users make the same search, a statistical profile 
of items of interest can be accumulated and used for future 
prioritization. 

[0009] The foregoing search approaches can be used 
75 individually or combined. For example, a search engine can limit 
the response items to those having matching metakeywords and 
then prioritize based on the interest indications by prior users. 
However, even when used collectively, it can be difficult to achies/e 
relevant prioritizations tailored to the specific interests of each 
20 user. What is needed is a search approach that can provide more 
personalized, relevant and useful prioritizations of search items. 

[0010] SUMMARY OF THE INVENTION 

[0011] The present invention provides a search engine that 
prioritizes items returned in response to a present search request 
25 by a present user according to an algorithm that assigns greater 
weight to interest indications by similar users making similar 
requests than to interest indications by users that are dissimilar or 
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that have made dissimilar requests. In a simple realization, the 
user is assigned to a unique community and the search request is 
assigned to a unique query context. Only interest indications made 
by members of the same community in the same query context are 
5 considered in prioritizing search results. Alternatively, users or 
requests or both can be classified hierarchically, and interests 
weighted as a function of the specificity of the community and/or 
query contexts. 

[0012] The method of the invention involves assigning the 
J 0 user to a community and the request to a query context, submitting 
a query associated with the query context and retrieving 
corresponding response items, prioritizing the response items 
based on a community and query context, and recording interest 
indications with respect to a community and query context for 
7 5 future prioritizations. The "search request" can be a natural 
language request made by a user, while the "query" is a formal 
search string (e.g.. Boolean expression) submitted by the search 
engine to institute the search. Typically, many similar but distinct 
search requests can be assigned to a common query context; in this 
20 sense, the query defines a "context" for a search request. 

[0013] Search histories are evaluated on a per-community 
basis. Interest indications can be tracked per community. 
Alternatively, taterest indication can be tracked per user, and the 
community interest data can be derived from the user interest data 
25 either on demand or at regular intervals. Tracking interest by 
community enables anonymous search and makes sense when 
users frequently and freely change communities from search to 
search. Tracldng interest by individual users, however, enables 
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search engines to discover communities c^amically so that user 
classifications can be defined automatically. 

[00141 Whether interest indications are tallied per community 
or per user, the invention also provides for "decaying" interest 
5 indications. Decaying permits newer interest indications to be 
weighted more heavily than older interest indications in prioritizing 
items. The decay can be implemented by associated a time with 
each hit or each hit count and using that time in determining a new 
hit count or a new hit-count weighting. 

W [00151 Prior art methods of prioritizing search results usmg 
search histories classify interest indications in one dimension, e.g. 
query context. The present invention considers a twodimensional 
context - referred hereto as the search context, which is composed 
of the query context and the community context - in prioritizing 

7 5 search-response items. In other words, the present invention 
tracks histories for each combination of query context and 
community context. Both query context and community context 
are used to determine weightings for interest indications (or in the 
special case, to determine which mterest indications are 

20 considered). This two-dimensional classification scheme 
substantially enhances the pertinence of the prioritization to the 
present user. 

[00161 Relative to prior art methods in which search items are 
pre-determmed or a pnori classified, the present invention provides 
25 a mechanism for communities of end users to collectively and 
dynamically evaluate and define the relevance of search items for 
their own use. The evaluation is automatic, not requiring pre 
classification by those responsible for the search engme or the 
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items being searched. Besides offering a more flexible mechanism 
for defining relevancy, the proposed methods also offers a more 
responsive mechanism to meet today's rapidly changing search 
conditions. These and other featm-es and advantages of the 
5 invention are apparent from the description below with reference 
to the following drawings. 

[001 7] BRIEF DESCRIPTION OF THE DRAWINGS 

[0018] FIGURE 1 is a schematic illustration of a search system 
in accordance with the present invention. 

10 [0019] FIGURE 2 is a diagram of a relational database of the 
present invention incorporated in the search system of FIG. 1. 

[0020] FIGURE 3 is a method of the invention practiced in the 
context of the search system of FIG. I. 

[0021] DETAILED DESCRIPTION OF THE PREFERRED 
15 EMBODIMENTS 

[0022] In accordance with the present invention, a search 
system API comprises a search engine 11, search input 
functions 13, a relational database 15, a temporary responseitem 
list 17, and user-response functions 19, as shown in FIG. 1. Search 

20 engine 11 interacts with all elements of search systemAPl. Search 
input fimctionsl3 include a search-request input function 21, a 
re que St- to- query assignment function 23, and a user-to-community 
assigimaent function 25. User-response functions 17 include a 
prioritize- and-display function 41, a hit-tracking function 43, and a 

25 hit-count-update function 45. 
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[0023] Relational database 15 is shown in greater detail in 
FIG. 2 comprising a query table 31, a community table 33, a user 
table 35, a user-community table 37, and a hit-count table 39. 
Query table 31 has a query serial number (QuerylD) as a key field 
5 and query string field (QueryString) and a parentquery field 
(ParentQuerylD). The query serial number for a search request is 
determined by request-query-assignment function 23. The query 
string field indicates the actual query string to be submitted by 
search engine 11 in executing the search. The parentquery field 
/ 0 allows query table 3 1 to refer to itself recursively to allow for the 
definition of hierarchical query contexts. 

[0024] Community table 33 uses a community serial number 
(CommunitylD) as a key and and includes a communityname field 
(CommunityName), a parent-community field 

15 (ParentCommunitylD), and a decay-factor field 
(HalfLifeDecayFactor). The community name is used in the user 
interface. For example, the community name may be the name 
presented to allow a user to select a community. User table 3 5 has 
a user serial number key field (UserlD) and user names are given in 

20 a UserName dependent field. The user fields are analogous to the 
counterpart community fields in community table 3 3. The parent- 
community field allows the commimity table to refer to itself 
recursively to allow hierarchical community contexts to be defined. 
The decay-factor field specifies the way in which interest 

25 indications are aged so that newer indications are weighted more 
than older indications. 

[0025] User-community table 37 has a UserCommunitylD key 
field, and has UserlD and CommunitylD as dependent fields. These 
dependent fields are related to key fields, respectively, for user 
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table 35 and community table 33. Each record in user-community 
table 37 associates a user with, a community. However, a user can 
be associated with different communities in different records; of 
course, a community can be associated with different users in 
5 different records. User-community table 37 thus allows users to 
belong to more than one community for a search. Embodiments 
permitting only one community per user can dispense with a user 
community table and just include a community dependent field in 
the user table. 

10 [0026] Hit-count table 39 provides the actual history data for 
prioritizing response items. The key field is a hit-count serial 
number (HitCountID); a hit count (HitCount) is determined for each 
hit-count serial number. Obviously, each hit count is associated 
with a specific response item (ItemID). Also, each hit count is 

7 5 associated with a specific query context (QuerylD) so there are 
different counts for an item retrieved in response to distinct 
queries. The present invention further provides that each hit count 
is associated with a specific community, so there are different hit 
counts for different communities for the same items aid query 

20 contexts. 

[0027] Hit-count table 39 also associates a count date 
(CountDate) with each hit count. The date is the date of the most 
recent previous hit. The count date is used to apply an exponential 
decay factor when calculating a hit count. In an alternative 
25 embodiment, the hit count is merely a tally of hits. The illustrated 
embodiment employs the decay factor to give higher ranking to 
more recent hits, and increasing lesser weighting to "stale" hits. 
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[0028] Search engine API implements a method Ml flow 
charted in FIG. 3. At step SI, a search request is received by search 
request function 21. This can involve presenting a user a search 
form that the user can fill out. The user can use the form to submit 
5 a search request. For example, a user can submit a search request 
by typing in search terms such as "networking printers". 
Optionally, the search form can present the user with a list of 
communities, e.g., "system administrators", "computer vendors", 
"end users" to join to help focus the search. 

10 [00291 At step S2, the search request is assigned to a query 
context by request-query assignment function 23 and the user is 
assigned to a community by user-community assignment 
function 25. Request-query assignment function serves to parse 
the human input string (the search string) into a more 

15 mathematical, machinereadable format (the query string). For 
example, the search string "networking printers" can be parsed and 
translated to the query string "network & printer" by a Boolean 
parser. Other search strings, such as "network printers", "printers 
on a network", etc., can map to the same query string. The query 

20 string defines a query context, which can be hierarchical; hence a 
"network & printer" query might be assigned as a sub query context 
of the "printer" query. The categorization of search requests into 
contexts allows for more effective recording and retrieval of 
relevant hit counts for prioritization in future searches. 

25 [0030] Also at step S2, the user is assigned to a community. 
The assignment can be based on the community selected at request 
time - i.e. the community specified on the search form (optionally) 
at step SI. For example, a user might be a member of a user 
community as opposed to a marketing community and would 
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explicitly make the appropriate selections on the search form 
before performing a search. This selection of community will affect 
the ordering of items returned in the search. 

[0031] Alternatively, the assignment of community can be 
5 determined by a profile, accessible to the search engine, associated 
with the user. The profile may be one directly submitted by the 
user, or indirectly submitted by an organization of which the user 
is a member. 

[00321 Furthermore, the communities can be predefined as 
10 above or generated adaptively. For example, a search system can 
look for patterns in users' collective search history to discover new 
communities with similar search patterns. In the illustrated 
embodiment, both the communities and the query contexts are 
hierarchical. This allows for a weltrounded prioritization of search 
J 5 results even when a given community or query context contains a 
relatively sparse history. 

[0033] In the illustrated embodiment, a user is allowed to join 
any number of communities and a community can consist of an 
arbitrary number of members. The abihty to join multiple 

20 communities allows users to search under multiple contexts. This 
can be useful as in the case when a road bicyclist is searching for a 
racing event by searching under the contexts of both "road 
bicyclists" and "bicychst racing." Alternatively, this approach 
allows for users that change communities. For example, a user may 

25 elect a "pet" community when exploring mice as pets and a 
"computer" community when the user is interested in cursor 
control devices. 
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[0034] At step S3, the search is initiated by submitting the 
query string and search-response items are collected. The search 
items meeting the search criterion are gathered in temporary 
response-item table 17. The search items can belong to a formal 
5 database such as a corporate knowledge database or can be 
documents residing on the Internet, which are not arranged in a 
formal, centralized database. 

[0035] The collected items stored in responseitem table 17 
are assigned record numbers, in this case an ItemlD. In addition, 

/ 0 the Universal Resource Locator (URL) for the document is stored. In 
addition, an item-content value can be stored as a prioritization 
factor. This can be the same value used in the prior art used for 
prioritization. For example, the number of occurrence of each 
search term can be considered, along with Bayesian considerations 

/ 5 of the proximity with which search words appear within an item. 

[0036] Response-item table 17 also includes a hit count field, 
but this is not determined from the document itself, but by 
accessing a hit-count table 39 in step S4. Hit-Count table 39 
presents hit counts as a function of search item, query, and 

20 community. For each item in responseitem table 17, table 39 is 
searched to see if there is a hit count for the presently assigned 
community and context. If there is, that hit count is entered into 
the %HitCount field of response-item table 17. In hit-count 
table 39, the queries are identified by the record number used in 

25 table 31, and communities are identified by the record number 
used in table 33. If no matching item is found in hit-count table 39, 
the hit count is zero. Of course, where hierarchical contexts (either 
query or community), the hit count is only zero If there is no 
matching item for the current context and all associated contexts. 

// 
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[0037] The items are then prioritized and displayed at step S5. 
In the illustrated embodiment, the items can be sorted simply by 
hit coimts. Alternatively, the content values for the items can be 
factored in. Thus, for two items with the same hit count, the oie 
5 with the greater content value is given priority. 

[0038] Once presented with the prioritized list of response 
items, the user may select some of them for further investigation. 
For example, each item may be presented with a page title and an 
excerpt from the page including one or more of the search terms. 
10 A user can review these "extracts", and cUck on promising items to 
go to the associated URL. Each such pursuit is tracked as a hit at 
step S6. 

10039] At step S7, hit coimts are updated in response to 
interest indications apphed to the present response items. In the 

7 5 present case, the hit count for an item is updated in response to a 
hit of that item by adding 1 to a time-adjusted version of the old 
count. Of course, the count date is then updated to the present as 
well. If an item that is not represented in hit-count table 39 
receives a hit, that item is then added to hit-count table with a hit 

20 count of 1; the present date is set as the count date. In alternative 
embodiments, decay is ignored and a hit count tdly can simply be 
incremented. Either way, the interest indications shown in the 
present search are available for prioritizing search items in future 
(indicated by dashed arrow) searches involving the same 

25 community and query context. 

[0040] If query contexts are hierarchical, the returned results 
can be ordered in a more well-rounded manner. Interest 
indications associated with ancestor query contexts can be factored 
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into a hit count in addition to the interest indications associated 
with the present query context. Preferably, the interest indications 
are weighted less the more generations away the associated 
ancestor context is to the present query context. For performance, 
5 some arbitrary cutoff can be used to limit the number of 
generations considered in the count. For example, suppose that the 
history for the query context "network printers" has not been 
estabhshed for a given community. Instead of returning a count of 
0, the system might query the parent context of "network printers", 
10 e.g., "printers", which might have a more estabhshed history to 
derive a related count. 

[0041] In a manner analogous to that described above for 
query contexts, communities can be hierarchical so that parent 
community contexts can be used to return a more welkounded 

7 5 search result as well. Consider the case where a new community 
called "water painters" is created under a parent community called 
"visual artists." Initially, when little history is established for the 
"water painters" community, the history of the parent community 
"visual artists" can be used to aid in prioritizing a search result for 

20 the "water painter" community. A weighting system similar to that 
used in nested query contexts, where weights are based on distance 
from the current context, can be used here as well For both 
community contexts and query contexts, it can be advantageous to 
include child contexts of the present context (and even of ancestor 

25 contexts) as well as ancestor contexts for the calculation of the 
count. 

[0042] A common problem, regardless of whether hierarchical 
contexts are used, relates to the general mechanism of recording 
hit counts in search histories. If hit counts are not properly retired, 
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outdated, formerly popular Items are continually prioritized over 
more relevant and current items; this occurs because it takes time 
to register enough counts for the currently popular items to 
overtake counts of the old item. The invention provides for a 
5 mechanism of aging hit counts systematically and automatically by 
recording the date the count was last updated and systematically 
updating the count each time the count is to be incremented by the 
formula: 

[0043] count = 1 + old_HitCount*(e**-.69*t/HalfLifeDecayFactor)). 

1 0 [0044] More generally, the decay factor can be any function of 
time that varies from 1 to 0 as time varies from 0 to infinity. This 
scheme effectively allows recent hits to be given more weight, as 
would be appropriate, for example, when a new product appears on 
the scene, and allows "stale" hits to be retired from consideration 

/ 5 in the prioritizing of search results. 

[0045] There are two main methods by which one can record a 
search history that can be retrieved for future prioritization. As 
mentioned above, a search history can be estabhshed on either a 
per-user or a per-community basis. Where history is kept on a per- 
20 user basis, the community history can be calculated by aggregating 
the histories of its member users. 

[0046] One advantage of per-user history approach is that, as 
users join and leave communities, the community history 
dynamically reflects the collective histories of its current users, 
25 including the effects caused by users joining and leaving the 
community. If a user with an estabhshed search history joins a 
new community, that community benefits from his entire history. 
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Another advantage of keeping history on a per-user basis is that it 
allows the possibility to have a system discover new communities 
of users. In per-user tracking, the community history can be 
derived - either dynamically with each search request or 
periodically at fixed predetermined intervals - from the aggregate 
user histories. The exact mechanism of how a community history 
is derived from its user histories is a performance based 
implementation detail not central to the current invention. 

[0047] A problem with per-user based history is that a user 
may have multiple interests, and his/her search history would 
reflect an amalgamation of those interests. As a result, the history 
of each of the communities to which he belongs will similarly 
reflect the amalgamation of his disparate iiterests. For example, if 
a user, who is both a linux user and a cat lover, joins the linux and 
cat communities, the linux community will register the user's cat 
related search history (and vice versa) as well! One viable solution 
is to more finely track a user's history based on the category of 
communities also. Another is to record history on a per 
community basis. 

[0048] The primary advantage of keeping a communitybased 
history is the "non-sticky" way in which counts are recorded. 
Counts are recorded only to the current communities associated 
with the current search. This way, users are free to change 
interests many times and often simply by changing in and out of 
communities while the focus of communities stay relatively focused 
and steady. On the other hand, recording in a sticky-way allows a 
user to carry the user's past histories along to the new 
communities as the user joins and leaves communities. In such 
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cases, the communities benefits (gains new history) as the user 
moves in and out of communities 

[0049] To get the best of both sticky and nonrsticky ways of 

recording hits one can record on a per-user and per-community 
5 basis. This allows user history to be portable as the user joins and 
leaves communities while solving the amalgamations of interest 
problem identified earlier with respect to per-user tracking. In 
most applications though, it is believed community history tracking 
gives the best performance/feature combination. Hence, In the 
W illustrated embodiment, history is recorded on a percommunity 
basis. 

[0050] In the illustrated embodiment, the search is conducted 
over the Internet, and items do not belong to a common formal 
database and are best considered as external to search system API. 
15 However, the invention is also applicable to searches of data h 
formal, centralized databases such as a corporate knowledge 
system. In that case, the item database can more fairly be 
considered part of the search system. 

[00511 The present invention has industrial applicability to 
20 searches of documents and other items over the Internet or 
arranged in formal databases. The foregoing and other variations 
upon and modifications to the illustrated embodiment are provided 
for by the present invention, the scope of which is defined in the 
following claims. 

25 [00521 What Is Claimed Is : 
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